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Abstract 

The Partial Credit Model (PCM) is sometimes interpreted as 
a model for stepwise solution of polytomously scored items, where 
the item parameters are interpreted as difficulties of the steps. It 
is argued that this interpretation is not justified. A model for 
stepwise solution is discussed. It is shown that the PCM is suited 
to model sums of binary responses which are not supposed to be 
stochastically independent. As a practical result, a statistical test 
of stochastic independence in the Rasch model is derived. 


1 Introduction 

Masters (1982) introduced the partial credit model (PCM) as an IRT 
model for polytomous items with ordered categories. The rationale he 
used to introduce the model was based on a response process where 
the subject responds sequentially to a number of subproblems in the 
item. The partial credit given equals the number of steps completed 
successfully, which of course in this rationale should be the hrst steps. 
This rationale, together with the tempting conclusion that the location 
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parameters in the PCM could be interpreted as difficulty parameters of 
the respective steps, was criticized by Molenaar (1983), who argued that 
the steps interpretation in the PCM is not justihed. 

This leaves two important questions: 

1. If the PCM is not suited as a formalization of the steps rationale, 
does there exist other models which can be used for this purpose? 

2. Does there exist a compelling rationale that justihes the use of the 
PCM? 

The first question will be addressed briefly in Section 2, where it is 
explained in some detail why the steps interpretation is not justihed in 
the PCM and where another model, especially designed to allow for such 
an interpretation is discussed. 

The second question, however, is the central focus of the present 
article: it investigates the relation between the Rasch model and the 
PCM. This is done in a number of stages. In the hrst stage (Section 3) it 
is shown that if a test complies to the Rasch model it also complies to the 
PCM in the sense that subsets of the items, called testlets, are considered 
as polytomous items with a score equal to the sum score on the items in 
the testlet. The converse, however, does not hold: if response patterns 
consisting of testlet scores comply to the PCM, it does not follow that the 
Rasch model holds at the level of the individual items, or more generally: 
the PCM is a much more general model than the Rasch model. 

In the next stage (Section 4), a general model for binary items is 
introduced, where it is possible to allow for a large number of interactions. 
The Rasch model is a special case of this general family. In the Rasch 
model all interactions vanish, and consequently it is the unique member 
of this family where conditional independence between all item responses 
exist. Two theoretical results are presented for the relation between this 
model and the partial credit model, applied to testlet scores. The hrst 
result (Section 4.1) is that each member of this family complies to the 
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PCM and the second result (Section 4.2) says that every PCM applied 
to testlet scores can be considered as a model for sums of binary item 
scores and thus complies to the general dependence model. The scientihc 
relevance of this hnding resides in the fact that the PCM is suitable model 
for tests of binary items where the condition of local independence is not 
met, without the necessity to explicitly model the precise form of the 
interaction effects. 

In Section 5, two practical implications of this approach are investi- 
gated. The hrst gives an answer to the question whether in estimating 
individual abilities of test takers, information is lost if the partial credit 
model is used in case the Rasch model holds (Section 5.1). The second 
implication relates to a general condition that has to be fulhlled for the 
results of Section 4 to be valid. This condition is that testlet scores must 
be locally independent. In Section 5.2 two methods are discussed to 
create testlets where there is within testlet dependency but no between 
testlet dependency. 

The article is concluded by a discussion section. 


2 The step interpretation of the Partial 
Credit model 


The dehnition of the PCM states that for an item with maximum score 


m. 


P{X = j\9,X = j or X = j-1) = 


exp(0 -|- Pj 


( 1 ) 


1 -h exp(0 Pj) ’ 

where 6 is the latent variable, and X the item score with values j = 
0, . . . ,m. The parameters Pj denote the m parameters associated with 
the item. Now suppose we construct the following two-step item 


two step item: 


1/2 + 0.25 
003 


which of course will lead to a completely correct response only if the 
hrst step (the addition) and the second step (the division) are computed 
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correctly. We can embed this item into a three-step item, where the third 
step can only be applied if the first two steps are completed. We can also 
vary the difficulty of the third step, which we do as an example in the 
following three versions of the three step item; 

version A: -|- 1 =? 

version B; =? 

version C: ^ exp(-a;V2)da; =? 


For 15 year old students, we may safely say that step 3 in version A is 
trivially simple, while the third step of version C will be extremely diffi- 
cult, and will be solved only by a few mathematically gifted students. The 
third step of Version B is probably not trivially easy in that age group, 
but one can assume that a substantial proportion of the population mas- 
ters the concept of the square root function. The step interpretation of 
the PCM implies that the value of j3\ and P 2 will be equal for the three 
versions of the three step item. But this is not consistent with (1) as will 
be shown by the following example, where we concentrate on P 2 and on 
the item versions B and C. 

Consider the population of all persons with 6 = 6q. In view of the 
interpretation given to the items, the response probabilities in Table 1 
might hold. Note that the probabilities of obtaining a score of 0, 1 and 


Table 1: Response probabilities at 6 = do 


score: 

0 

1 

2 

3 

version B 

0.1 

0.45 

0.15 

0.3 

version C 

0.1 

0.45 

0.44999 

0.00001 


(2 or more) are the same for both versions; in version B, however, 2/3 
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of the students having reached successfully step 2, can also solve step 3, 
while in version C almost nobody is successful on step 3. For version C, 
the probability of a score of 2, given that the score is 1 or 2 is very close 
to one half, whence it follows from (1) that j32 will be very close to — 00 - 
In version B, however, the conditional probability of obtaining a score 
of 2, given that the score is 1 or 2 is 0.25, whence it follows, using (1), 
that P 2 = —(00 + In 3). This shows that the value of P 2 does not depend 
uniquely on the difficulty of the second step but also on the difficulty 
of the subsequent step(s), and consequently that any interpretation of 
PCM parameters as difficulties of specihc item steps is void. 

The conclusion is that the PCM is not suitable to model sequential 
solution strategies. An appropriate model was found independently at 
two different places at about the same time. De Vries (1988) and Ver- 
helst, Glas and De Vries (1997) developed a model by combining the 
simple Rasch model with a subject controlled incomplete design; the 
steps or subitems of a polytomously scored item are conceived as being 
administered in a hxed sequence and the next subitem is presented if and 
only if the previous one is correctly responded to. The answer to each 
subitem is modeled by the simple Rasch model. The presentation of a 
subitem thus depends on the behavior of the responding subject, hence 
the qualihcation subject controlled. Tutz (1990, 1997) followed the same 
rationale, but introduced the model formally and more generally as 

p, ^ P{X > j\e,X > j) = F{9 + f3j), (j = 0, . . . ,m - 1), (2) 

where F{.) is an arbitrary distribution function. It can readily be seen 
that in both models, the category response functions are given by 

f i^-Pj)UlllPg 

P{X = m = } (3) 

I U7=oPg = 

whence it follows that both models are identical if F is the logistic dis- 
tribution function with argument 0 -|- (3j 
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3 The distribution of sums of Rasch item 
scores 

Suppose m(> 1) items can be described by the Rasch model, i.e. for any 
value of the latent variable 6, 


P{Yi = yi\9) (X exp[yi{9 + (3i)], {i = 1, . . . ,m), (4) 


where yi G {0, 1}. 

Dehning the variable S' as S' = Y^, and assuming conditional in- 

dependence as usnal, it is readily seen that 

P(^ = s|0) oc exp(s0)^ JJef, (5) 

T,y=s i 

where e* = exp(/3i). The combinatorial fnnction represented by the snm 
in the right-hand side of (5) is known as the basic or elementary symmet- 


ric fnnction (of order s) of the mnltivariate argnment e = (^i, . . . ,£m), 
and will be denoted by 7 s (e). It is dehned formally as 

7s = 7.(^) = (s = 0,...,m). (6) 

J]y=s i 

Note that 70 (e) = 1. Dehning 

7 s = -ln 7 s(e), (s = 0, (7) 

equation (5) can be rewritten as 

P{S = s|6') oc exp(s6' — rjs)-, (8) 


which is nothing more than the category response fnnction of the PCM in 
a parameterization hrst nsed by Andersen (1977). Notice that rjo equals 
zero. 

Suppose that a test that consists of k Rasch items is partitioned into 
T classes, consisting of mi, . . . ,mT items. These classes will be called 
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testlets, and the sums of the item scores in each testlet will be called 
testlet scores. The distributions of these testlet scores can be described 
by the PCM because the original item responses are independent and the 
classes are disjoint. 


There are two important observations to be made in connection with 
this result. First, if only testlet scores are observed instead of the original 
item scores, then it is in principle possible - although not easy - to esti- 
mate the original Rasch parameters from the sum scores. The problem 
to be solved in case of Maximum Likelihood (ML) estimation is this: hnd 
the values of the PCM parameters rj that maximize the likelihood under 
the restriction that for each testlet t there exist mt positive real numbers 
£ti, . . . Stmt such that the non-linear restrictions given by (7) hold for each 
testlet. It these ? 7 -values are found, the e-parameter estimates may be 
found from solving for each testlet the system of non-linear equations 
given by (7). But, even when one succeeds in hnding ML-estimates for 
the e-parameters, it is not possible to associate them with the original 
items. If all mt e-parameters are distinct in testlet t, then there are 
different associations possible, and there is no way of deciding between 
them on the basis of the testlet scores alone. 


The second observation is more important. Although it is true that 
sums of Rasch item scores are distributed acording to the PCM, the 
converse is not true: polytomous item scores whose distribution is given 
by the PCM cannot always be interpreted as sums of Rasch item scores. If 
they were, it would follow that for m arbitrary numbers r]i, . . . , rj^, there 
would exist m (positive) real numbers £i, . . . such that (7) is true, 
and this would be equivalent to claiming that all m-th degree polynomials 
with positive coefficients have m real- valued (negative) roots, which is not 
true. This is why the ML estimation procedure loosely described in the 
previous paragraph is difficult. We explain this in more detail. 
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Consider the polynomial of the m-th degree 

m 

Pm{x) = '^{X + Ei), (9) 

i=l 

with all Ei real and positive. Obviously, the roots are real and all negative 
(they equal —Ei). Expanding (9) gives 

Pra{x) = 7oa;™ + + 720;”"“^ H h 7ma:°, (10) 

where the coefficients 7^, (s = 0 , . . . , m) denote the elementary symmetric 
functions as dehned by (6). Finding the values of e from the coefficients 
of the polynomial is equivalent to Ending its roots. Determining from the 
coefficients whether and how many real roots do exist is an unsolved (and 
probably unsolvable) problem. A necessary condition for the existence 
of m real roots has been derived by Isaac Newton (Hardy, Littlewood & 
Polya, 1952, theorem 51). It is rephrased here as 


Theorem 1 (Newton) If a polynomial Pm as in (10) has real coejfieients 
7o, 7i, ..., 7 m, then, if there are m real roots, it holds that 


{s + l){m — s + 1) 


7s-i7s+i < 7s, (s = 1, . . . , m - 1), 


s[m — s) 

with equality holding only if all roots are equal. 


For m = 2, the condition of the theorem is also sufficient for the 
existence of real roots, but for higher degrees it is not, as the following 
example shows. Set 70 , . . . , 73 to 1, 9, 25 and 17 respectively. It is easily 
checked that the two inequalities following from the theorem are fulfilled, 
but the roots of the cubic polynomial are —1, —4 + i and —4 — i, i.e., 
there are two complex roots. Nevertheless, as a necessary condition, 
the theorem puts severe restrictions on the possibility to interpret PCM 
item scores as sums of Rasch item scores, since in the PCM no restrictions 
whatsoever are put on the parameter space; i.e., for a partial credit item 
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with maximum score m, the parameter space is W^. These restrictions 
led Van Engelenburg (1997) to the conclusion that the PCM is not an 
adequate model to describe the distribution of sums of binary item scores. 
It will be shown in the next section that these restrictions are a direct 
consequence of assuming local independence between the binary item 
responses. 


4 Models with dependent responses 


To model dependencies between item responses, it is easier to model 
whole response patterns than merely item responses, because dependence 
means lack of local independence, and therefore impossibility of multi- 
plying item response functions. 

As before, we assume that the test consists of k binary items, and 
is partitioned into T testlets, containing mi, . . . ,niT items respectively. 
As most of the discussion to come will focus on a single testlet, explicit 
reference to the testlet number will be dropped. 

Consider a testlet consisting of m(> 1) items. The vector Y = 
(Yi, . . . , Ym) with realizations y = {yi, , ym) will be called the response 
pattern. The random variable A, with realizations s, dehned by 


S = -S(Y) = U (11) 

i=l 

is called the testlet score. Dehne the m sets J^, {g = 1, . . . ,m) as the 
sets containing all ordered gf-tuples of the numbers 1, . . . , m. This means 
h = {!,..., m}, h = {(l,2),...,(l,m),(2,3),...,(m-l,m)}, etc. The 
cardinality of Ig is (™) . The general model that will be studied is given 



238 


N. D. Verhelst and H. H. F. M. Verstralen 


by 

P(Y =y|0) oc 

exp s9 + E ViPi + ^ ViVjPij H ymPij-m , 

i£h 

( 12 ) 

and by the assumption of local independence between testlet response 
patterns. Notice that the last sum in the right-hand side of (12) has only 
one term; it is written as a sum to make the structure of the model clear. 
The model is a generalization of the Rasch model; if all /^-parameters 
having two or more subscripts are set to zero, the Rasch model results. 
The extra parameters catch possible interactions between items, and if 
one of them is non- zero, local independence is lost. 

Model (12) and several submodels resulting from setting interaction 
parameters to zero have been studied by Kelderman (1984); see also 
Verhelst & Glas (1995a). It should be stressed that this model and 
various submodels are estimable if the item responses are observed. What 
matters here, however, is to see what happens if only the testlet scores 
St, t = 1, ... ,T, are observed. 

4.1 Testlet scores modeled by the PCM 

Since testlet scores are assumed to be independent given 6, it suffices to 
consider a single testlet (without reference to its number t). Taking the 
sum of (12) over all response patterns with testlet score s gives 

P{S = s|0) oc exp(s0)x 

E'^-'^p E ^iPi T ^ ^ ^FjPij T ' ' ' T ^ ^ ‘ ‘ ‘ ^mPij"-m • 

Sz=s L II I2 Im 

Notice that in the preceding expression the vector z = ■ ■ ■ , Zm) does 

not refer to any observed response pattern: it is to be understood as the 
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generic expression for a reponse pattern within the testlet. The outer 
sum in (13) runs over all response patterns having a testlet sum score of 

s. 

To elucidate the structure of Expression 13 and its importance, we 
write it with another parameterization. Dehne 

Ei = exp(/3j); Eij = exp{(3ij); . . . ; = exp(/3jj...^), 


and the vector e* as 

^ (^ 1 ) • • • ) ^ 12 ) • • • ) ■ ■ ■ t ^12. ..m')- 

Furthermore, dehne the combinatorial function T 5 (e*) as 

=E n<'‘ X wcEt", (14) 

"^Z=S I2 Im 

so that (13) can be written as 

P{S = s|0) oc exp(s6') X T 5 (e*). (15) 

For m = 3 the sum in the right-hand side of (14) is displayed, term by 
term, for the three possible patterns that have a score of 1 or 2 (see Table 
2). For a score of zero, the sum has one term equal to 1, and for a score 
of 3, the sum also consists of a single term equal to the product of all 
e-parameters. 


Table 2: Illustration of (14) 


score 

= 2 

score 

= 1 

pattern 

term 

pattern 

term 

1 1 0 


1 0 0 


1 0 1 

^1^3^13 

0 1 0 

£2 

Oil 

^2^3^23 

0 0 1 

£3 
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This makes clear that the value of the sum depends on the value of 
the e-parameters and on s, but not on any specihc response pattern that 
leads to the testlet score of s, whence it follows that the second factor 
in the right-hand side of (15) is a function of the e-parameters and the 
score s. Since it is a sum of exponentials, it is positive, and therefore we 
can write it as exp(— ? 7 s(e*)) or exp(— 77 ^) for short. Moreover, it is clear 
from (13) that f]o = O.With this notation, (15) can be written as 

P{S = s|0) oc exp(s0 — rjs), (16) 

which is formally equivalent to the PCM. 

This result is summarized as 

Theorem 2 For any value of the e* -parameters in the dependence model 
(12), and for all testlets consisting of m binary items, there exists a set 
of m functions 771 ^ . . . , 77 ^ such that the distribution of the testlet score S 
in the dependence model is identical to its distribution under the PCM 
with parameter values 771 ^ ... , 77 ^. These functions are given by 

77 ^ = -lnT^(e*), (s = 1, . . . , 7 u), 
where r^(e*) is defined by (14)- 

The number of elements in e* is \^g\ = 2™ — 1, so that the pa- 

rameter space of the dependence model (with the e-parameterization) is 
What the theorem says is that the functions ( 771 , . . . , 77 ^) con- 
sidered jointly dehne a vector-valued function from into M™, the 

parameter space of the PCM at the testlet level. In Figure 1, this result 
is displayed graphically. The left-hand ellipse represents the parameter 
space of the dependence model and a dot represents an e*-vector. For 
each such vector there is a (unique) vector in the parameter space of 
the PCM (right-hand ellipse) representing the equivalent model (at the 
testlet score level) in the PCM-family. 
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Dependence pCM 



Figure 1: The relationship between the parameter space of the depen- 
dence model and the PCM. 


This is the main result of this paper: a fairly complicated model for 
binary responses (the model dehned by (12)) can be htted by using the 
PCM at the level of testlets. The number of parameters r]s to be esti- 
mated is the same as in the Rasch model, but the assumptions are far 
weaker: complicated patterns of item dependency within testlets are au- 
tomatically absorbed in the PCM-parameters r]s- Moreover, the sufficient 
statistic for the latent variable, the raw score, is the same as in the Rasch 
model. 

4.2 The PCM for testlets as a model for sums of 
binary scores 

There remains, however, a complementary problem, which can be seen 
from Figure 1: in the right-hand ellipse (the parameter space of the PCM) 
there are dots which are not at the end-point of an arrow, symbolizing 
vectors in the parameter space of the PCM which cannot be written 
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as the ?7-transformation of any e*-vector in the parameter space of the 
dependence model. The qnestion to be answered is whether such r]- 
vectors can exist. If they cannot, then we have the result that every 
partial credit score in the PCM can be interpreted as a sum of m binary 
item scores, where the distribution of these binary scores is given by the 
dependence model (12). In the remaining part of this section, it is shown 
that this is indeed the case. 

Since the second factor in the right-hand side of (13) dehes simplih- 
cation, a number of restrictions on the /^-parameters will be introduced 
which yield a more tractable expression, and yet result in a model which 
covers the parameter space of the PCM. Specihcally, we will assume all 
interaction parameters of the same order to be equal, i.e., 

I3h = Xg for all h e Ig, {g = 2, . . . ,m). (17) 

Formally, by applying these restrictions we consider a subspace of the 
orginal parameter space of the dependence model. Where the original 
subspace has dimension 2"^ — 1, the restricted subspace has dimension 
2m — 1, because there are m /^-parameters with a single subscript and 
m — 1 interaction parameters, A 2 , . . . , Am- 

Using the restrictions (17) and the fact that all gf-fold products Zi^ x 
■■■ X Zi^ vanish if > s{y), and equal one in cases if g < s{y), (12) 
can be rewritten as 

P(Y = y I6I) oc exp s9 yS + ^ Xg , (18) 

ie/i g=2 X9J _ 

whence it follows that (13) simplihes to 

P{S = s|0) cx exp{s9) X exp e(:)^. xEn<‘ 

_g=2 X9J J j 

= exp(s«) X exp ^ (^*) A, x 7,(e). 


(19) 



About the PCM 


243 


Define 

r/, = -ln7,(e) - ^ QAj, (s = l,...,m), (20) 

where the sum in the right-hand side of (20) is dehned to be zero if s < 2. 

Now it is easy to show that for any ordered set of m 77- values it is always 
possible to hnd e- and A-values such that (20) is fulhlled. The values for 
the e-parameters can be taken arbitrarily from the positive reals, with 
the only restriction that minus the logarithm of their sum equals rji. In 
this way (20) is fulhlled for s = 1. The A-values are given by sequentially 
applying (from (20)): 

A, = -lu7,(e) -7, (s = 2, ...,m). (21) 

g=2 

We illustrate this by a simple example for m = 2. Suppose 71 = 0 and 
72 = 2. Consider the following two e-vectors: = (0.7, 0.3) and = 

(0.9, 0.1). It holds that 71 (e*^^^) = 7i(^^^^) = 1, complying in both cases 
to the restriction that 71 = —In 71(e). The basic symmetric functions 
of order 2, however are not equal in both cases as 72(e*-^^) = 0.21 and 
72(e*^^^) = 0.09. Applying (21) in both cases, we hnd 

A^ = _ ln(0.21) - 2 = -0.439 and A^^^ = - ln(0.09) - 2 = +0.408 

and therefore, the two e*-vectors (0.7, 0.3, exp(— 0.439)) and (0.9, 0.1, exp(0.408)) 
are transformed into the same 7- vector (0,2). This result is stated for- 
mally as 

Theorem 3 The m-valued function (71 , . . . ,7^) defined by (I4) over a 
subspace of the parameter space of the dependence model, defined by (17), 
is a function from onto W^. 

The meaning of this theorem is graphically displayed in Figure 2. 

The restricted subspace is symbolized by the area in the left-hand ellipse 
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to the right of the waved line. From Theorem 2, we know that there 
exist an arrow from all points in this subspace to a unique point in the 
parameter space of the PCM. In Theorem 3, it is stated that all points 
in the parameter space of the PCM are the endpoints of such an arrow. 
Since 2m — 1 > m if m > 2, this function cannot be one-one; therefore 
more than one arrow ends in every point of the PCM space. 


Dependence pCM 



Restricted 
Parameter space 


Figure 2: The relationship between the parameter space of the restricted 
dependence model and the PCM. 


In summary, it has been shown that every model in the family dehned 
by (12) is formally equivalent to the PCM when the distribution of the 
testlet score is modeled (Theorem 2), and conversely, that every PCM 
can be understood as a model for the testlet score, where the joint distri- 
bution of the item responses within the testlet is given by (12) (Theorem 
3). If the item responses are observed, then (12) is identihed and the 
parameters may be estimated; if only sums of item scores are observed, 
however, model (13) results, and the model is no longer identihable, be- 
cause there are more parameters than different values of the score. Only 
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functions of these parameters are estimable, for example, the functions 
given by (14) and one-one transformations of these functions. 

The practical implication of this result is discussed in the next section. 


5 Practical implications 

In applications of the Rasch model, one can focus on different aspects, 
either paying attention to the structure of the model itself, or focusing on 
its application, i.e. on the inferences one can make on the latent abilities 
of concrete persons or groups of persons. 

An example of the hrst is the research with the so-called Linear Lo- 
gistic Test Model (LLTM) (see for example Fischer, 1995; Bechger, Ver- 
stralen & Verhelst, 2002), where the item parameters are considered as 
linear combinations of a (small) number of so-called basic parameters. In 
these models local independence between item responses is an essential 
part of the model, and estimates of the parameter values require that 
data are availble at item level. Detecting that the assumption of local in- 
dependence is violated in a concrete application of the LLTM invalidates 
the model immediately, and the results obtained in the previous section 
cannot be put at use. 

There exist, however, other applications where the use of IRT serves 
a more practical purpose. We take a survey, like national or international 
assessment in education as a typical situation. There the focus is on the 
distribution of the target latent variable (e.g., reading literacy) in popu- 
lations and subpopulations, for example, the comparison of the literacy 
distributions of boys and girls, in subpopulations that vary in socio-econic 
status, across different countries and over time. The practical value of 
using an IRT-approach is that it allows to include much more item ma- 
terial than can be responded to by a single testee, and that it allows to 
include new item material over time, and at the same time guarantee 
invariance of the measured concept, although new and old material may 
differ in difficulty. A large scale project where the Rasch model has been 
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used as IRT model is the PISA project (Adams, 2002). The practical 
advantage of the results reported in the preceding section is that it does 
not matter whether the assumption of local independence holds or does 
not hold, as long as such dependencies are correctly modelled. Applying 
the PCM at the testlet level is an easy way to capture arbitrary depen- 
dencies between items of a testlet, without the necessity of unraveling 
and testing the precise nature and extent of such dependencies. 

Two questions, however, remain to be answered. The first concerns 
the possible loss in information when one models testlet scores instead 
of item scores. The second has to do with the vagueness of the notion 
of testlet in the preceding section. The results were shown to be valid 
independently of the way the testlets were dehned, as long as the testlets 
were disjoint and the testlet scores locally independent, but it is not 
not a trivial problem to form such a collection of testlets in a practical 
application. These two problems will be discussed in turn. 

5.1 Loss of information 

One might be worried that, if the Rasch model holds, the use of the PCM 
at the testlet level will lead to information loss, i.e., that the accuracy 
of the latent variable estimates (or its distribution) will be weaker when 
based on the PCM rather than on the (correct) Rasch model. There 
is, however, no reason for such a worry. Both the Rasch model and 
the PCM are an exponential family of models, and for such models it 
holds that the Fisher information equals the variance of the sufficient 
statistic (Barndorf-Nielsen, 1978). The commonly used estimate for the 
standard error of the 0-estimate is the square root of one divided by the 
information. In both models, the sufficient statistic for 6 is the sum of 
the testlet scores, and from a comparison of (5) and (8), we see that 
the distribution of the sufficient statistic for any value of 6 is the same 
in both models, and therefore the variance is the same as well. In case 
the Rasch model is valid, the PCM is just a reparameterization of the 
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Rasch model, defined by (7), and the standard errors of the 6^-estimates 
are identical under both models. 

But what if the Rasch model is not valid? If the dependence model 
(14) is valid, but not the Rasch model, then the Fisher information can 
be determined correctly from it. Of course, the parameters must be 
estimated from a hnite set of data, such that one will obtain only an 
estimate of the Fisher information. This estimate, however is consistent. 
If one estimates the variance of the scores using the incorrect Rasch 
model, the result cannot be interpreted as the Fisher information since 
the measurement model is not valid, so that comparisons with the Fisher 
information under the PCM are meaningless. 

A related, but quite different question is whether tests with dependent 
items lead to more of less accuracy of the ^-estimates than tests that 
comply to the Rasch-model. The answer to this question is not simple, 
as is shown by the following illustration. Suppose m = k = 2 and the 
parameters j3\ and /?2 are both equal to zero. Now consider three models 
with these parameters hxed, and the interaction parameter j3\2 taking the 
values 0, —0.5 and +0.5 respectively, as examples of the Rasch model, a 
dependence model with negative and a dependence model with positive 
hrst order interaction respectively. The information functions of these 
three models are displayed graphically in Figure 3. 

The information function for the Rasch model (the solid curve) shows 
a well-known characteristic of all IRT-models: the accuracy with which 
9 can be estimated depends on the value of 9 itself. In Figure 3, we see 
that most information is conveyed for 9 = j3\ = j32- For the dependence 
models, two characteristics are important, and have shown to be stable 
for a wide range of parameter values for which similar hgures have been 
scrutinized. 

The hrst is the maximum information of the model. The maxima 
are located at different places, and it appears that the lower value the 
of the interaction parameter, the higher the location of maximum in- 
formation. The maximal information itself, however, seems to correlate 
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Figure 3: Information functions for a two item test with zero, positive 
and negative interaction. 


positively with the interaction parameter f3i2- the larger this parameter, 
the larger the maximal information. 

The second characteristic is that all pairs of curves in the hgure do 
intersect. This means that for no model the information is uniformly 
higher of lower than that of another model. For example, the model 
in Figure 3 with the lowest modal information {f3i2 = —0.5) has higher 
information than the other two for 6 > 1. 

With more items in a testlet, with more than one testlet and more 
complicated interactions, it might be far more difficult to describe in 
general terms the effect of interactions on the information function. 
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5.2 Detecting interactions 

In practical applications, it may not always be easy to detect sets of 
items where dependence is likely to occur. The most likely candidates 
are items formulated as questions about the same stem, as is often the 
case in reading tests. But other dependencies may occur as well, for 
example in cases where the presence of an item, item i, say, in a linear 
test contains clues for the solution of another item j. Two methods are 
discussed to hnd out whether dependencies are present or not. 

The hrst one departs from a Rasch analysis, where independence is 
assumed. If conditional maximum likelihood (CML) is used as estimation 
method, it is fairly simple to construct the matrix of predicted pairwise 
frequencies of correct responses. The expression is 


k-l 


E{nij) — 


£j£j7s-2 (g) 

7s (^) 


n 


s=2 


( 22 ) 


where Ug is the frequency of score s in the sample, 7s(£) is the gamma 
function of order s evaluated at the CML-estimates, and 7 ^!!. 2 ^(e) is the 
gamma function of order s — 2, evaluated on the vector of e-parameters, 
where e* and Sj are excluded. Response patterns with a score of zero 
or one are not counted because for these it is impossible to have both 
items correct, and score k is excluded because the probability of having 
items i and j correct trivially equals one. Simple or weighted comparison 
between observed and expected pairwise frequencies may reveal pairs of 
items where the covariation is too high or too low to be compatible with 
the assumption of independence. A suitable weighted comparison is 


Zij = ±1 


n* [riij - E{njj)f 
E{nij)[n* - E{nij)Y 


(23) 


where the sign is the same as the sign of the difference in the numerator 
of (23), and n* = The quantity zfj is readily recognized as the 

common chi-square statistic computed on a 2 x 1 contingency table with 
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observed frequencies nij and n* — Uij respectively. Its signed square root 
is approximately standard normally distributed. 

The second method starts from a PCM analysis and can help in de- 
ciding whether the scores on a testlet with maximum score m can be 
conceived as a sum of m Rasch items. One can proceed along the follow- 
ing lines: 


1. Using (7), the PCM parameter estimates can be converted to the 
coefficients of the m-th degree polynomial Pm given by (10). Using 
a solution hnder, one can hnd all roots of Pm- If they are all real 
(and negative by necessity), the Rasch estimates of the parameters 
e are given by minus the roots. 


2. If not all roots are real, this may be caused by genuine depen- 
dencies, but also by sampling error. So we might wish to have a 
statistical test that enables us to reject the latter hypothesis. It 
appears to be quite hard to construct such a test, and we did not 
hnd a solution to this problem. We can, however, construct a more 
conservative test, by using Theorem 1 and (7). The null hypothesis, 
i.e., the Rasch model, can be written in the following two equivalent 
forms 


^ (s + l)(m-s + l)^^7,_i(e)7,+i(e) ^ ^ 
^0 • ^ ^ X ^7-^ < i. 


s{m — s) 


7si^) 


(s = l,...,m-l). 


or 


-U X) (in — 5 1) 

Hq: ds = 2r]s-r]s-i-r]s+i+\n r <0, (s = 1, . . . , m-1). 

s[m — s) 

( 24 ) 

The Wald test statistics are 

lU, = ^, (s = l,...,m-l). (25) 

t ^ bfi 

where dg equals dg evaluated at the ML-estimates, is the es- 
timated variance-covariance matrix of r/s-i, and (in that 



About the PCM 


251 


order) and t' = (— 1,2, — 1). Wg is asymptotically chi-square dis- 
tributed with one degree of freedom, and therefore its signed square 
root is standard normally distributed. The sign of the square root 
is the sign of ds- If s = 1, the hrst row and column of consist of 
zeros, since = ho = 0- The null hypothesis is rejected at the 5% 
level of signihcance if Wg > 1.96^ and dg > 0. 


6 Discussion 

In this section, the results of the preceding sections are summarized and 
some comments are added. 

1. The partial credit model is not suited to describe difficulties of item 
steps. In complex items, where steps can be distinguished, there is 
no invariant relation between parameter values and the difficulty 
of the steps. This means that the set of parameters associated 
with a partial credit item should be considered as a joint formal 
description of the item as a whole. 

2. If the Rasch model holds for a set of k items, the PCM also holds for 
every partition of the original k item scores in T sum scores dehned 
on T testlets (subsets of items) of arbitrary size. T is arbitrary 
too. Moreover, there exists a well specihed non-linear relationship 
between the Rasch model parameters and the PCM parameters, 
given by (7). Although the Rasch parameters can be recovered 
uniquely from the PCM parameters, it is impossible to associate 
these values to particular Rasch items, because any permutation of 
the Rasch parameters of the testlets leads to the same likelihood. 

3. One should be careful not to confuse the algebraic equivalence of 
two models with relations between parameter estimates. We give 
two comments in this respect. 
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• Suppose the Rasch model holds, and m = 2 for some testlet. 
Then it follows from Newton’s theorem that for the testlet it 
holds that 772 > 2r]i + In 4. But if one estimates the param- 
eters ? 7 i and 772 from a hnite data set, even if it is known to 
comply with the Rasch model, as with artihcially generated 
data, there is nothing that guarantees that this inequality is 
fulhlled with the estimates. The only thing that is known for 
sure is that the probability that the inequality is violated goes 
to zero as the sample size increases without bound. Therefore 
the maximum of the likelihood function using the PCM at the 
testlet level will never be smaller than the maximum using 
the Rasch model. To decide whether the assumption of local 
independence is credible, one will have to use a statistical test 
procedure like the one proposed in Section 5.2. 

• Although the results discussed are also valid (at the algebraical 
level) in case T = 1, this case cannot be tested empirically, 
because CML-estimates in the PCM do not exist if the test is 
composed of one partial credit item. 

4. In Section 4, a model for binary items is presented that allows for 
complicated dependencies between item responses. If such depen- 
dencies are restricted to subsets of m items, it is shown that such 
a model is equivalent to the PCM if testlet scores are modelled 
instead of binary reponses. Moreover it is shown that each PCM 
model may be interpreted in this way. This does not imply, how- 
ever, that such an interpretation also has substantive meaning. The 
general model (12) is overparameterized if only testlet scores are 
observed, and an interpretation in terms of these many parameters 
is a possibility, but certainly not the only one. 


5. The practical use of the results mainly resides in the possibility 
to ignore complicated dependencies between item responses with- 
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out loosing information about the underlying latent variable. Two 
methods have been proposed to detect such dependencies, such that 
the testlet dehnitions may be adequately chosen. 

To conclude, we add a warning against overoptimism. Even if one 
would succeed completely in identifying subsets of binary items such that 
the resulting testlet scores are locally independent, this does not imply 
that the PCM at the testlet score level is the correct model. More general 
models like the generalized PCM, allowing for different discriminations of 
testlets, or multidimensional models, or even totally different approaches 
might point to weaknesses in the simple PCM. There is plenty of room 
for sustained theoretical research. 
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